翻訳と辞書
Words near each other
・ Liv Tyler
・ Liv Ullmann
・ Liv Undheim
・ Liv Warfield
・ Liva
・ Liva (album)
・ Liva Weel
・ Liva, Estonia
・ Livability (charity)
・ Livability court
・ Livability.com
・ Livable Eindhoven
・ Livable Netherlands
・ Livable Rotterdam
・ Livable South Holland
LIVAC Synchronous Corpus
・ Livada
・ Livada Solar Park
・ Livada, Arad
・ Livada, Burgas Province
・ Livada, Satu Mare
・ Livade
・ Livade, Croatia
・ Livadeia
・ Livadeia B.C.
・ Livadeia Province
・ Livadero
・ Livadh Lake
・ Livadhja
・ Livadi


Dictionary Lists
翻訳と辞書 辞書検索 [ 開発暫定版 ]
スポンサード リンク

LIVAC Synchronous Corpus : ウィキペディア英語版
LIVAC Synchronous Corpus

LIVAC is an uncommon language corpus dynamically maintained since 1995. Different from other existing corpora, LIVAC has adopted a rigorous and regular as well as "Windows" approach in processing and filtering massive media texts from representative communities in the Pan-Chinese region including Hong Kong, Macau, Taipei, Singapore, Shanghai, Beijing, Guangzhou, and Shenzhen.〔Tsou, Benjamin; Lai, Tom; Chan, Samuel; and Wang, William S.-Y. (Eds). (1998). ''Quantitative and Computational Studies on the Chinese Language'' 《漢語計量與計算研究》. Language Information Sciences Research Centre, City University Press.〕 The contents are thus deliberately repetitive in most cases, represented by textual samples drawn from editorials, local and international news, cross-Formosan Straits news, as well as news on finance, sports and entertainment.〔Tsou, Benjamin, and Kwong, Olivia. (Eds). (2015). ''Journal of Chinese Linguistics Monograph Series No.25: Studies on Corpus Linguistics and Linguistic Corpus in the Chinese Content''. Hong Kong: Chinese University Press.〕 By 2014, more than 550 million characters of news media texts have been processed and analyzed and have yielded an expanding Pan-Chinese dictionary of 1.7 million words from the Pan-Chinese printed media. Through rigorous analysis based on computational methodology, LIVAC has at the same time accumulated a large amount of accurate and meaningful statistical data on the Chinese language and their speech communities in the Pan-Chinese region, and the result shows considerable and important variation.〔Tsou, Benjamin. (2004). "Chinese Language Processing at the Dawn of the 21st Century", in C R Huang and W Lenders (eds) ''Language and Linguistics Monograph Series B: Frontiers in Linguistics I'', pp.189–207. Institute of Linguistics, Academia Sinica.〕〔Tsou, Benjamin, and Kwong, Olivia. (2015). "Some Quantitative and Qualitative Characteristic Features of the Chinese Language". In Wang S.-Y. William and Sun C.-F. (eds) ''Oxford Handbook of Chinese Language and Linguistics''. Oxford University Press.〕
The "Windows" approach is the most representative feature of LIVAC and has enabled Chinese media texts from the Pan-Chinese context to be quantitatively analyzed according to various attributes such as locations, time and subject domains. Thus, various types of comparative studies and applications in information technology as well as development of related innovative applications have been possible. Moreover, LIVAC has allowed longitudinal development to be taken into account, facilitating Key Word in Context (KWIC) and comprehensive study of target words and their underlying concepts as well as linguistic structures over 19 years, based on variables such as specifications of region, duration and content domain. Results from the extensive and accumulative data analysis contained in LIVAC have enabled the cultivation of textual databases of proper names, places, organizations, new words, and bi-weekly and annual rosters of media figures. Related applications have included the establishment of verb and adjective lexicons, the formulation of sentiment indices, and related opinion mining, to measure and compare the popularity of global media figures in the Chinese media (LIVAC Annual Pan-Chinese Celebrity Rosters, later renamed as Pan-Chinese Media Personalities Rosters)〔(Pan-Chinese top media celebrities of 2013 ), City University of Hong Kong, Hong Kong, 02 January 2014.〕〔(HKIEd Announces 2012 LIVAC Pan-Chinese Celebrity Roster of Chinese Media ), Hong Kong Institute of Education, Hong Kong, 20 December 2012.〕 and construction of monthly new word lexicons (LIVAC Annual Pan-Chinese New Word Rosters).〔(CityU releases 2013 Pan-Chinese New Word Rosters ), City University of Hong Kong, Hong Kong, 02 February 2014.〕〔(2012 LIVAC Pan-Chinese New Word Rosters ), Hong Kong Institute of Education, Hong Kong, 02 January 2013.〕 On this basis, the analysis of the emergence, diffusion and transformation of new words, and the publication of dictionaries of neologisms have been made possible.
==Corpus data processing==
# Accessing media texts, manual input, etc.
# Text unification including conversion from simplified to traditional Chinese characters, stored as Big5 and Unicode versions
# Automatic word segmentation
# Automatic alignment of parallel texts
# Manual verification, part-of-speech tagging
# Extraction of words and addition to regional sub-corpora
# Combination of regional sub-corpora to update the LIVAC corpus, and master lexical database

抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)
ウィキペディアで「LIVAC Synchronous Corpus」の詳細全文を読む



スポンサード リンク
翻訳と辞書 : 翻訳のためのインターネットリソース

Copyright(C) kotoba.ne.jp 1997-2016. All Rights Reserved.